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Abstract 

Background: As today, there are hundreds of targeted therapies for the treatment of cancer, many of which have 
companion biomarkers that are in use to inform treatment decisions. If we would consider this whole arsenal of 
targeted therapies as a treatment option for every patient, very soon we will reach a scenario where each patient is 
positive for several markers suggesting their treatment with several targeted therapies. Given the documented side 
effects of anticancer drugs, it is clear that such a strategy is unfeasible. 

Results: Here, we propose a strategy that optimizes the design of combinatorial therapies to achieve the best 
response rates with the minimal toxicity. In this methodology markers are assigned to drugs such that we achieve a 
high overall response rate while using personalized combinations of minimal size. We tested this methodology in 
an in silico cancer patient cohort, constructed from in vitro data for 714 cell lines and 138 drugs reported by the 
Sanger Institute. Our analysis indicates that, even in the context of personalized medicine, combinations of three or 
more drugs are required to achieve high response rates. Furthermore, patient-to-patient variations in 
pharmacokinetics have a significant impact in the overall response rate. A 10 fold increase in the pharmacokinetics 
variations resulted in a significant drop the overall response rate. 

Conclusions: The design of optimal combinatorial therapy for anticancer treatment requires a transition from the 
one-drug/one-biomarker approach to global strategies that simultaneously assign makers to a catalog of drugs. The 
methodology reported here provides a framework to achieve this transition. 
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Background 

Personalized cancer therapy has been proposed as the next 
battle in the war on cancer and targeted therapies as the 
new warfare machinery [1], Targeted therapies are designed 
to treat cancers carrying specific molecular alterations. In 
turn these molecular alterations can be used as companion 
biomarkers to inform the decision of using, or not using, 
the targeted therapy to treat a patient [2]. For example, in 
the context of breast cancer, the level of the receptor tyro- 
sine kinase HER2Mew is used to select trastuzumab 
(Herceptin; Genentech) as adjuvant therapy [3]. 

By design, a targeted therapy is expected to be effective in 
a subset of cancer patients (e.g., trastuzumab in HER2Mew 
positive breast cancer patients). However, even within this 
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subset, the long-term response may be reduced. Some pa- 
tients may initially respond to the targeted therapy but later 
on regress due to the occurrence of secondary molecular al- 
terations. For example, in the context of melanoma, cancers 
with the BRAF(V600E) mutation can be treated with 
vemurafenib (Zelboraf, Plexxikon) resulting in outstanding 
response [4], However, in about one year most patients re- 
gress due to upregulation of compensatory pathways [5,6]. 
The molecular background of a cancer can also modulate 
the response to a targeted therapy, even when treatment is 
suggested by the biomarker. For example, as a difference 
with melanoma patients, colon cancer patients harbouring 
the same BRAF(V600E) mutation show a very limited re- 
sponse to vemurafenib [7]. One mechanism explaining this 
difference is the feedback activation of EGFR upon treat- 
ment with vemurafenib and the fact that EGFR levels are 
higher in colon cancer than in melamoma cells [8]. 

Although targeted therapies may fail as single agents, 
they can still be effective when used in combination with 
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other agents. Combinatorial therapy is a rational ap- 
proach to overcome the failure of single drugs [9]. One 
hypothesis is that one agent in the combination can 
cover for the caveats of other agents, increasing the 
response rate [10]. As for the case of single agents, 
biomarkers can be used to inform the inclusion of 
targeted therapies in a drug combination, which we name 
personalized combinatorial therapy. 

The shift from single drug targeted therapy to combina- 
torial personalized therapies introduces a new challenge. As 
today, there are hundreds of targeted therapies with their 
associated biomarkers, some of which are already in use to 
inform treatment decisions. If we would consider the whole 
arsenal of targeted therapies as a treatment option for every 
patient, very soon we will reach a scenario where each 
patient is positive for several markers suggesting their 
treatment with several targeted therapies [11]. Given 
the documented side effects of anticancer drugs, it is 
clear that such a strategy is unfeasible. A new strat- 
egy is needed to optimize the design of combinatorial 
therapies to achieve the best respond rates with the 
minimal toxicity. In this work we introduce a methodology 
to achieve this goal. 

Results and discussion 

The shift from single drug targeted therapy to personal- 
ized combinatorial therapies introduces a new challenge. 
We need to define a protocol to design the personalized 
combinations given a catalog of drugs, a catalog of 
markers and the status of those markers in the patients 
cancer. To formally address this problem we introduce 
the scheme depicted in Figure 1. We are given as input a 
cohort of patients together with the status of m markers 
in those patients. To be more precise, the markers status 
of each patient is represented by a barcode or Boolean 
vector Xi=(xa,...jCi m ), where Xu=l when marker / is ob- 
served in patient i and 0 otherwise. We are also given as 
input a set of drugs that are available for anticancer 
treatment. In the context of personalized medicine we 
would like to assign markers to a drug to identify the pa- 
tient subpopulation with the best response rates. Again, 
to be precise, the marker assignment to each drug is 
represented by a barcode or Boolean vector Yj={yj lf ... } 
yj m ), where y ; -/=l if marker / is used to inform the treat- 
ment with drug / and 0 otherwise. A drug-to-sample 
protocol fj(X if Yj) is used to inform the treatment options, 
where fj{X b Yj)=^ indicates to consider drug / as a treat- 
ment option for sample i and Jj(X it Yj)=0 otherwise. For ex- 
ample, Figure 1 illustrates the protocol where fj(XiXj}=l if 
the sample and the drug share a marker in common. Once 
the treatment options are determined for each sample, we 
then apply a patient protocol g to choose the personalized 
therapies for each patient. For example, Figure 1 illustrates 
the protocol g indicating the treatment with the drug with 



1- Input: patients □, patient marker status X, drugs O 

^=(1, 0,0,1 ,0) □ 
X 2 =(0, 0,0,1 JO) □ 

□ 



X 3 =(0,0,1,0,0) 
X 4 =(0,0,0,0,1) 
X 5 =(l, 0,0,1,0) 



o 
o 
o 
o 



2- Assign markers to drugs Y 

X 1= (l, 0,0,1,0) □ 
X 2 =(0,0,0,1,0) □ 
X 3 =(0,0,1,0,0) □ 
X 4 =(0,0,0,0,1) □ 
X 5 =(l, 0,0,1,0) □ 

3- Determine treatment options 

X^d, 0,0,1,0) □ 

X 2 =(0,0,0,1,0) □ 

X 3 =(0,0,1,0,0) □ 

X 4 =(0,0,0,0,1) □ Q 

Jf 5 =(l, 0,0,1,0) □ 

4- Choose treatments 

X 1= (l, 0,0,1 ,0) B< 

X 2 =(0,0,0,1,0) EX 

X 3 =(0,0,1,0,0) 
X 4 =(0,0,0,0,1) □ 
X 5 =(l, 0,0,1,0) n+ 

L 



O F,=(l, 1,0,0,0) 

o y 2 =(o,o,i,o,o) 

o y 3 =(o ,0,0,1,0) 

o r 4 =(o,o,o,o,i) 




y,=( 1,1 ,o,o,o) 

-o r 2 =(o,o,i,o,o) 
y 3 =(o,o,o,i,o) 
-o y 4 =(o,o,o,o,i) 




y,=(i, 1,0,0,0) 

y 2 =(n,o,i,o,o) 
y 3 =(0 ,0,0,1,0) 
o y 4 =(o,o,o,o,i) 



Optimize Y for maximum response rates 

Figure 1 Optimization of personalized therapies. 1- We are 

given as input a set of patients, Boolean vectors reporting the 
markers status on each patient (X) and a set of drugs available for 
treatment. 2- A Boolean vector reporting the markers that will be 
used to inform treatment is specified for each drug (Y). 3- Drugs are 
suggested for the treatment of each patient using a drug-to-sample 
protocol depending on the sample and drug markers {fj(X h Yj)). In this 
example a drug is suggested for the treatment of a patient 
whenever they share at least one marker. 4- Finally, a sample 
protocol is used to specify the treatment to each patient (g). In this 
example the best treatment for each sample is selected. Finally, we 
optimize the marker assignments to drugs (Y), the drug-to-sample 
protocols (fj{XjXj)) and the sample protocol (g) to obtain the maximum 
overall response rate, as represented in the figure by the arrow. 



highest expected response rate among the treatment 
options identified for each patient {gbestxf- Another possibil- 
ity is to treat with the c drugs with the higher response 
rates among those suggested for each patient (gbestj- 



Vazquez BMC Systems Biology 201 3, 7:31 
http://www.biomedcentral.eom/1752-0509/7/31 



Page 3 of 1 1 



The current approach to targeted therapies is to assign 
markers to drugs based either on the target for which 
the drug was developed or some preliminary study 
suggesting an increase response rate in patients having 
the marker. We take a more general approach where the 
markers are assigned to drugs to maximize the response 
rate to therapy. To this end, we define the following 
optimization problem: 

Find the drug marker assignments Yp the drug-to-sample 
protocols fj and sample protocol g that maximize the over- 
all response rate O. 

Response model 

To calculate O we require the probability that each pa- 
tient responds to a drug when the drug is used as a sin- 
gle agent and some quantification of drug interactions. 
In the simplest scenario where there are no drug interac- 
tions, the probability P t that a patient responds to is per- 
sonalized therapy is given by the probability that it 
responds to at least one of the drugs on its personalized 
combination 

Pt = i-n (i-^y* 

7=1 

where e i; =l if drug / is included in the personalized ther- 
apy of patient i and p t j is the probability that patient i re- 
sponds to drug ; when the latter is used as a single 
agent. When interactions are present we can improve on 
(Eq. 1) after adding correction terms accounting for 
two-drug interactions and so on 

d d-l d 

p. = !_£/=! ;=i k=j+i 

(2) 

In this equation values of Jjk<0 will result in response 
rates higher than what expected if the drugs do not 
interact (synergy) while values of 7// c >0 will result in re- 
sponse rates lower than what expected if the drugs do 
not interact (antagonism). We note that antagonism 
could take place at the level of pharmacodynamics (an- 
tagonism at the cellular level) or at the level of pharma- 
cokinetics (antagonism at the drug metabolism level) 
and the latter may result in increased toxicity. 

The average of P t across samples defines the overall 
response rate O of the personalized combinatorial 
therapies 

°=- s t, p i ( 3 ) 

i=l 

We are aware of documented examples of drug inter- 
actions in the context of cancer treatment [12]. 



However, for most combinations we do not have a quan- 
titative estimate of how these interactions affect the re- 
sponse rate. For the purpose of illustrating our 
methodology, we will use the non-interacting drugs ap- 
proximation (Eq. 1) in our simulations. 

Response-by-marker approximation 

In the clinical practice we cannot test the response of 
each cancer patient to each approved anticancer drug. 
However, we can estimate the response rate to a drug 
depending on the present/absence of the markers 
assigned to that drug. For example, let us consider the 
case where Kj markers are used to inform the treatment 
with drug The patients are divided into 2 K ' groups de- 
pending on the status of those markers. We can conduct 
a clinical trial to estimate the response rate q(j,s) of drug 
/ for each group of patients. Once the q(j,s) are known, 
we can estimate the response rate to any patient. To be 
more precise we enumerate the patient groups using the 
index 

sj(x) = X>/,2*- 1 (4) 
k=l 

where ^/ 7i , lj K ^j is the list of markers assigned to drug 

/ and Xi is the status of the /-th marker. Using this nota- 
tion we obtain the response by-marker approximation 

p 9 = q(j,sj( x t)) (5) 

In short, the probability that a given patient i responds 
to a given drug / is approximated by the estimated frac- 
tion of patients that responds to that drug within the 
group of patients having the same status as patient i for 
the markers assigned to drug 

Finding the optimal personalized combinations 

We need some procedure to find the optimal treatment 
combinations. In the Methods section we report a simu- 
lated annealing algorithm that performs an exploration 
of the space of markers assigned to drugs and drug-to- 
sample protocols with a gradual increased bias towards 
improvements on the overall response rate. Although 
this algorithm may not find the optimal solution, it can 
provide a good approximation to hard computational 
problems [13]. 

Updating the drug-to-sample protocols 

During the optimization procedure we need to explore 
different marker assignments to drugs and different 
choices of drug-to-sample protocols. To this end we 
need some precise representation of the Boolean func- 
tions and the transformations among them. The drug- 
to-sample protocols are represented by a Boolean function 
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fjiXiXj) that returns 0 (do not suggest) or 1 (suggest) de- 
pending on the status of the markers assigned to the drug 
on a given sample. For computational convenience it is 

easier to write the Boolean functions as fj(Xi,Yj) = fj 

(Xfa, ...,Xu K ^j, where Kj is the number of markers 
assigned to drug ...Ikj is the list of markers assigned 
to drug / and fj is a Boolean function of Kj inputs. Given K 
markers there are 2 k possible input states (#1,...,%), which 

K 

can be enumerated as follows: a(x) = ^\/ c 2 /c_1 . For each 

k=l 

of these input states we can set the output o fl to 0 or 1. 
We can enumerate the Boolean functions with K inputs 

2 K -l 

using the mapping b{6) = °a^ a ~ l . Therefore, we can 

represent every Boolean function with two indexes (K,b), 
the first one denoting the number of inputs and the sec- 
ond one the specific Boolean function with /C-inputs. 

Figure 2a and b show all Boolean functions with one 
(l,b) and two (2,b) inputs, respectively. Each Boolean 
function is represented by a truth-table where for each 



imput the output 0 or 1 is specified. The letters A and B 
are used to denote the inputs and the b index of each 
function is indicated on the upper raw of the truth-table. 
We note that functions where the output is independent 
of at least one input are not considered, because they 
can be reduced to a simpler function. For example func- 
tion (1,0) (Figure 2a) is equivalent to have no markers 
assigned and function (2,3) (Figure 2b) is equivalent to 
(1,1) (Figure 2a) after removing the marker B. 

To explore different Boolean functions we change 
the function, add a new marker or remove one marker. 
When changing a Boolean function, (K,b)—>(K,b') f a 
new function is selected at random among all consid- 
ered Boolean functions with the same number of in- 
puts. When removing a marker, (7<T,&)— >(/<T-l,& , )> if the 
drug has one marker then we remove it, the drug will 
have no markers assigned and, therefore, it will not be 
considered for the treatment of any patient. If the drug 
has two markers assigned then we remove one of the 
two markers and use the transformations illustrated in 
Figure 2c and d. For example, in Figure 2c we start 
with the function (2,2) and remove the B (right) input. 
For this function the output is always 0 when the A 
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c) Remove right-marker 
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f) Remove left-marker 




(1,1) 



(1,2) 




(1,1) 



(1,2) 



Figure 2 Boolean functions and operations among them, a) Boolean functions with one input. Functions with a dashed line are not 
considered because the output is independent of the input, b) Boolean functions with two inputs. Functions with a dashed line are not 
considered because the output is independent of at least one input, c) An example showing the removal of the right marker (B) from function 
(2,2), which can be either result into function (1,0) or (1,1). Since function (1,0) is excluded then the function (1,1) is always chosen, d) Same 
example but removing the left marker (A), e) All mappings from (2,b) to 0,b') following removal of the right (B) marker, f) All mappings from {2,1 
to (1,6') following removal of the left marker, e) All mappings from (1,6) to {2,b') following the addition of a marker. For simplicity, we have 
chosen the reverse of the right-marker removal (panel e) as the mapping for the marker addition. 
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(left) input is 1 but the output can be 0 or 1 when the 
A input is 0. Therefore, (2,2) can be mapped to (1,0) or 
(1,1) after removing the B input. Since the output of 
(1,0) is independent of the input state it is not consid- 
ered. A similar reasoning can be applied to obtain the 
mappings for function (2,2) when removing the A marker 
instead (Figure 2d). Applying this approach to every (2,Z?) 
function we obtain the mappings in Figure 2e and f. Fi- 
nally, if a marker is added, (K,b)^>(K+l,b'), then we use 
the mappings in Figure 2g, which are the reverse of (K-l, 
b')^>{K>b) removing the A (left) input. In all cases, when 
more that one choice is available we choose one of them 
with equal probability. 

Case study 

To test our methodology we investigate an in silico case 
study where we can actually quantify the response of 
each sample to each drug. The in silico case study is 
based on in vitro growth inhibition data reported by the 
Sanger Institute [14]. In the Sanger screen 714 cell lines 
were tested for their responses against 138 drugs. For 
several sample-drug pairs the natural logarithm of the 
drug concentration to achieve a 50% growth inhibition 
relative to untreated controls (logIC50) was reported. 
The logIC50 data is missing for 26,031 drug-cell line 
pairs, representing 20% of all drug-sample pairs. The 
missing logIC50 data was imputed using the weighted 
average approach described in the Methods section. The 
Pearson Correlation Coefficient (PCC) between the im- 
puted and actual log50s, when the latter were available, 
was 0.89 (Figure 3). 




-20 -10 0 10 20 



Reported logIC50 

Figure 3 Comparison between the imputed and reported 

loglC50 data. The solid line represents the case when the imputed 

and reported values coincide. 
\ ) 



For each cell line the cancer subtype and the status of 47 
cancer related genes was also reported, including somatic 
mutations and copy number alterations. We use as markers 
the observation of a specific cancer type (e.g., breast can- 
cer), somatic mutations (e.g., TP53:wild-type, TP53:R175H, 
TP53:R248Q, etc.), and copy number alterations (gene:-, 
gene:0, gene:+ for deletion, normal and amplification, re- 
spectively). This procedure resulted in 921 markers. Among 
those, we retained 181 markers that are observed in at least 
10 cell lines. 

To each cell line we associate a sample that is fully 
composed of that cell line. We assume that different 
drugs are used at different treatment doses because they 
are active at different concentration ranges. The mean 
logIC50 of a drug across cancer cell lines is a good esti- 
mate of the typical concentration for the drug activity in 
this in vitro setting. Thus, for each drug we set the treat- 
ment log-concentration y y =mean(logIC50) y +log/z, where 
h represents the fold change in the dose. Values of h 
below 1 represent low dose therapy, while those above 1 
represent high dose therapy. In average, cancer cells 
have IC50s that are about 2 fold lower than those of nor- 
mal cells [15]. Based on this we assume that the highest 
tolerated dose is /z=2, and that is the dose used for 
treatment. 

We assume that due to variations in drug delivery the 
actual log-dose reaching the cancer cells, denoted by Z y , 
is different from j y . Pharmacokinetic variables generally 
follow a normal distribution after a log-transformation 
[16] and, therefore, we assume that Z y (the log-dose) is a 
random variable following a normal distribution, with 
mean yj and variance a. Here a models variations associ- 
ated with drug pharmacokinetics in patients. Pharmaco- 
kinetic parameters characterizing the steady state plasma 
drug concentrations and drug clearance rates can vary as 
much as 2-10 fold [17,18]. To model such variations we 
will use a=l,10. 

We define a response as the achievement of at least 
50% growth inhibition. In this case a sample responds to 
a drug if Z y >logIC50^ and does not respond otherwise. 
Under these assumptions, the probability pij that sample 
i responds to drug ; is given by 

_ 1 ( logIC50y-mean( logIC50) ■- log/^ 



where erfc(#) is the complementary error function. 
When the cell line loglCSO^ is much higher than the 
treatment dose reaching the cancer cells (logIC50^- 
yf»d) then p^O. In contrast, when the cell line 
logIC50^y is much lower than the treatment dose 
reaching the cancer cells (logIC50 iy -y y <<cr) then p^l. 

To test a more realistic scenario, we are not going to 
use the response probabilities in (Eq. 6). Instead, we are 



Vazquez BMC Systems Biology 201 3, 7:31 
http://www.biomedcentral.eom/1752-0509/7/31 



Page 6 of 1 1 



going to use the response by-marker approximation in 
(Eq. 5). To this end, given a drug and its assigned 
markers, we divide the cell lines into groups depending 
on the status of those markers, and estimate the re- 
sponse probability of q(j,s) as the average of p t j over all 
cell lines in that group. To avoid biases from small 
group sizes, we set #(/,s)=0 for any group with less than 
10 samples. 

We do not have an estimate of the possible interac- 
tions between the 138 drugs in this in silico study. We 
assume that the drugs do not interact and we approxi- 
mate the response to a personalized drug combination 
by (Eq. 1), but replacing p t j by the response by-marker 
approximation (Eq. 5). 

In the optimization problem defined above we could 
attempt to optimize the marker assignments to drugs, 
the drug-to-sample protocols fj{X b Yj) and the sample 
protocol g. However, to reduce the computational com- 
plexity of the problem, we will impose the sample proto- 
col gbest,c> assign at most two markers to each drug and 
optimize over marker assignments to drugs and the 
drug-to-sample protocols. 

Using the simulated-annealing algorithm we obtained 
the optimal personalized therapies for the in silico co- 
hort. In general we have no way to warranty that the 
simulated-annealing algorithm did not get stuck at a 
local minimum, precluding it from finding the optimal 
solution. However, by starting at different initial assign- 
ments of markers/Boolean-functions and monitoring the 
improvement on the solutions found we can get an idea 
of how close we are from the optimal solution. Figure 4 
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Figure 4 Convergence of the simulated-annealing algorithm for 

the in silico study. The overall response rate (as estimated with the 
by-marker approximation, 0*) as a function of the number of initial 
conditions tried. 



shows the highest overall response rate (as estimated 
with the by-marker approximation, O*) as more initial 
conditions were tested. There are no significant im- 
provements between a 100 and 1,000 initial condi- 
tions indicating that the simulating-annealing 
algorithm is close to the optimal solution. 

We note that in this study we count with the actual 
response probability of each cell line to each drug. 
Therefore, we can use as input the optimal personalized 
combinations obtained by using the response by-marker 
approximation (Eq. 5) and then calculate the overall re- 
sponse rate using the original cell line response rates 
(Eq. 6). 

When the pharmacokinetic variations are small (cr=l), 
the predicted overall response rate is as high as 90% when 
treating with personalized therapies using one drug alone. 
Then it increases towards 100% as we move to personal- 
ized combinations using more drugs (Figure 5a). However, 
a 10-fold increase in the pharmacokinetic variations 
(cr=10) results in a drop of the overall response rate to 
about 60% when treating with one drug alone (Figure 5a). 
This observation indicates that the success of personalized 
therapy will also depend on the magnitude of pharmacoki- 
netic variations and on our ability to personalize the drug 
dosage for each patient to counteract those pharmacoki- 
netic variations. 

We note that not all drugs are included in the treat- 
ment of at least one sample, resulting in a smaller effect- 
ive drug catalog (Figure 5b). For all the maximum 
combination sizes tested, less than 80 out of 138 (58%) 
of the drugs are needed. Furthermore, beyond personal- 
ized combinations of three drugs, we observe a decrease 
in the number of needed drugs as we increased the max- 
imum allowed combination size (Figure 5b). This obser- 
vation suggests that the need for only 58% of the drugs 
will hold for larger combination sizes. We note that the 
decrease of the needed drugs is unexpected. For ex- 
ample, if the response rates were independent identically 
distributed random variables then the probability that a 
drug is selected for the treatment of a samples is c/d, the 
probability that a drug is selected for the treatment of at 
least one sample is l-(l-c/d) s and the average number of 
drugs used for the treatment of at least one sample is 
d* = d[l - (1 - c/d) s ]. Therefore, for independent identi- 
cally distributed response rates d* increases monoton- 
ically with increased the combination size c. The 
departure from this expectation in Figure 5b could be 
due to the existence of correlations in the response 
rates of different drugs when treating different cells 
lines. Furthermore, we cannot exclude that for large c 
the simulated-annealing algorithm gets trapped in local 
optima and that for the actual global optimal d* does 
increases with increasing c. In any event this discrep- 
ancy should motivate future work to obtain theoretical 
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estimates for d* based on the patterns of correlations 
between the response rates and the ability of the 
simulating-annealing algorithm to reach the global 
optimum. 

In Table 1 we report the effective drug catalog for the 
small pharmacokinetic variations case (o*=l) and maximum 
combination size c=3 drugs. In addition, we report whether 
those drugs were included in the catalogs for c-1 and 2, 
showing the percent of samples treated when included and 
(-) otherwise. Most drugs in the c=3 catalog are also in- 
cluded in the c=l and 2 catalogs, indicating that there is a 
core set of drugs that is relevant independent of the max- 
imum combination size allowed. The percentage of sam- 
ples treated with a given drug in the catalog increases from 
c=l to 3. This effect can be explained by the fact that, as 
we allow combinations of more drugs, a drug can be in- 
cluded in personalized combinations as a second or third 
choice. 

We note that in some instances the marker assigned 
to a drug coincides with what expected given the known 
drug target (Table 1, Markers and Target columns). For 
example, the marker TP53:wt (i.e., TP53 wild-type) is 
suggested to inform the treatment with nutlin-3a. This 
makes sense because nutlin-3a releases TP53 from the 
inhibition by its negative regulator MDM2 and the out- 
come of nutlin-3a treatment is modulated by the TP53 
status [19]. In another case, the marker BRAF:V600E is 
assigned to the BRAF inhibitor PLX4720 [20]. The 
marker KRAS:G12D is assigned to another BRAF inhibi- 
tor, AZ628, which still makes sense because KRAS is just 
upstream of BRAF in the RAS/RAF/MAPK/ERK signal- 
ing pathway [21]. In another case, the marker ERBB2:0 
(i.e., normal ERBB2 copy number) and the Boolean func- 
tion (1,1) (i.e., suggest in the absence of the marker) are 
assigned to the ERBB2/EGFR inhibitor BIBW2992, 



which again makes sense since ERBB2 inhibitors are 
expected to be more effective in the presence of ERBB2 
amplifications [22], However, in most instances the rela- 
tion between the assigned marker/Boolean-function and 
the known target is not obvious. The best example is the 
assignment of a tissue type as a marker, rather than the 
status of the gene coding for the target or another gene 
in the same pathway. 

Conclusions 

We have proposed a methodology that optimizes the as- 
signment of companion biomarkers to drugs to achieve 
the highest possible response rate with the minimal tox- 
icity. The outcome of our methodology is an optimal 
drug catalog, the assignment of optimal biomarkers to 
each drug and a treatment decision protocol where a 
drug is used to treat a patient when the latter is positive 
for the drug companion biomarker. The application of 
the treatment decision protocol for every drug in the 
catalog results in optimal personalized combinatorial 
therapies for every patient. 

An interest future direction is the investigation of the 
impact of drug interactions. We expect that the 
optimization approach will favor drugs that synergize with 
many other drugs in the catalog relative to those that do 
not interact or antagonize with other drugs in the catalog. 
At the end, the interplay between manifesting a high re- 
sponse rate in a group of patients and the degree of syn- 
ergy (or absence of antagonism) with other drugs in the 
catalog will determine the suitability of a given drug for its 
use in personalized combinations. The challenge will be to 
estimate of the degree of synergy/antagonism between 
current anticancer drugs. 

Our methodology is entirely based on estimated re- 
sponse rates given a marker. The latter can be estimated 
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Table 1 The catalog of drugs in the optimized personalized combinatorial therapies 





% of samples treated 


K 


f 


Markers 


Target 


c 


1 


2 


3 


3 


3 


3 


3 


Embelin 


0.7 


5.9 


31.5 


2 


13 


lung: small_cell_carcinoma ; TP53:wt 


XIAP 


Nutlin-3a 


4.1 


8.0 


24.5 


2 


11 


TP53:wt,RB1:wt 


MDM2 


Bicalutamide 


1.3 


3.9 


21.4 


2 


11 


ALK:wt,KRAS:0 


Androgen receptor (ANDR) 


XMD8-85 


1.0 


5.7 


19.5 


2 


9 


CDKN2A:wt,malignant_melanoma 


ERK5 (MK07) 


Shikonin 


1.7 


1.7 


11.8 


1 


2 


TP53:wt 


unknown 


NVP-BEZ235 


0.6 


- 


8.0 


2 


4 


PTEN:wt,EZH2:wt 


PI3K (Class 1) and mTORC1/2 


CI- 1040 


1.8 


3.8 


7.6 


2 


6 


malignant_melanoma,TP53:p.R273H 


MEK1/2 


EHT 1864 


1.1 


3.5 


7.1 


1 


2 


lung: small_cell_carcinoma 


Rac GTPases 


BMS-754807 


- 


4.1 


7.0 


2 


14 


neuroblastoma,KRAS:p.G1 2V 


IGF1R 


PLX4720 


0.8 


3.5 


6.9 


2 


13 


BRAF:p.V600E,MSH2:wt 


BRAF 


BX-795 


0.7 


5.7 


6.3 


2 


14 


glioma,KRAS:+ 


TBK1, PDK1, IKK, AURKB/C 


AhCT inhibitor VIII 


2.4 


5.6 


6.3 


2 


9 


lung: NSCLC: adenocarcinoma,EGFR:wt 


AKT1/2 


AZD6482 


3.8 


6.2 


6.3 


1 


2 


glioma 


PI3Kb (P3C2B) 


RDEA119 


3.9 


7.4 


6.0 


2 


13 


malignant_melanoma,BRCA1 :0 


MEK1/2 


MS-275 


4.9 


5.9 


5.9 


2 


6 


lung: small_cell_carcinoma ; RB1 :wt 


HDAC 


BI-D1870 


0.7 


1.7 


5.3 


2 


14 


CCND1:0,MYCN:0 


RSK1/2/3/5, PLK1, AURKB 


MG-132 


1.4 


- 


5.2 


1 


2 


glioma 


Proteasome 


FH535 


1.5 


3.6 


5.0 


2 


14 


breast,CCND1:+ 


unknown 


Docetaxel 


2.8 


1.3 


4.6 


2 


9 


upper_aerodigestive_tract,EGFR:wt 


Microtubules 


CGP-60474 


- 


2.8 


4.3 


1 


2 


CDKN2a(p14):p.? 


CDK1/2/5/7/9 


AS601245 


1.7 


2.5 


4.2 


2 


7 


ovary,osteosarcoma 


JNK 


NVP-TAE684 


1.5 


3.8 


4.2 


1 


1 


APC:wt,stomach 


ALK 


Epothilone B 


1.3 


1.4 


4.1 


2 


7 


PIK3CA:p.E545K,TP53:p.R248W 


Microtubules 


Camptothecin 


2.9 


3.8 


4.1 


2 


7 


AMLJymphoblastic T cell leukaemia 


TOP1 


Vorinostat 


4.1 


5.7 


4.1 


1 


2 


MYCN:+ 


HDAC inhibitor Class I, lla, lib, IV 


A-443654 


1.4 


3.6 


4.1 


1 


1 


SMAD4:wt 


AKT1/2/3 


PD-0325901 


1.1 


2.1 


3.9 


2 


11 


large_intestine ; VHL:0 


MEK1/2 


RO-3306 


1.5 


3.2 


3.9 


2 


11 


cervix,MYCL1:0 


CDK1 


1 7-AAG 


2.2 


2.2 


3.8 


2 


6 


STK1 1 :wt,MET:0 


HSP90 


S-Trityl-L-cysteine 


1.1 


3.5 


3.8 


1 


1 


FBXW7:wt 


KIF11 


ZM-447439 


0.1 


1.3 


3.6 


2 


7 


lung: NSCLC: large cell,RB1:- 


AURKB 


Vinblastine 


1.4 


2.2 


3.2 


2 


13 


upper_aerodigestive_tract,IDH1:0 


Microtubules 


Paclitaxel 


0.1 


2.5 


3.2 


2 


11 


oesophagus,TSC1:wt 


Microtubules 


AICAR 


0.6 


1.7 


2.9 


1 


1 


KDM6A:wt 


AMPK agonist 


BIBW2992 


1.4 


2.2 


2.9 


1 


1 


ERBB2:0 


EGFR, ERBB2 


JNK-9L 






2.7 


1 


2 


AML 


JNK 


BAY 61-3606 


1.0 


2.1 


2.7 


1 


2 


Ewings sarcoma 


SYK 


AMG-706 






2.7 


1 


2 


Ewings sarcoma 


VEGFR, RET, c-KIT, PDGFR 


AZ628 




2.2 


2.5 


2 


9 


KRAS:p.G12D,FGFR3:0 


BRAF 


BMS-536924 




1.1 


2.5 


2 


7 


KRAS:+,MDM2:+ 


IGF1R 


JW-7-52-1 


1.1 


1.7 


2.5 


1 


2 


stomach 


MTOR 


Elesclomol 


1.0 


3.5 


2.4 


2 


8 


bladder,TSC1:wt 


HSP70 


Pyrimethamine 


0.8 


3.5 


2.4 


1 


2 


pancreas 


Dihydrofolate reductase (DHFR) 
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Table 1 The catalog of drugs in the optimized personalized combinatorial therapies (Continued) 



l/IMAAl ITT 

KINUU \ -\dd 


0.3 


0.7 


2.2 


1 


1 


MET:+ 


IKKE 


Dasatinib 






2.0 


2 


1 3 


Renal cell carcinonna,NRAS:0 


ADI CDr l/IT DPir~CD 

AbL, oKL, KM, rUUrK 


A DT OOO 

Ad 1 -ooo 


1 .5 


2.5 


1 .7 


2 


1 1 


lymphoicLneoplasin other,CDK4:0 


D A DD 1 /"> 

rAnr l/z 


BI-2536 


2.9 


3.1 


1 .7 


2 


2 


LUKNzA:p.Ur,MYLl) 


Dl 1/1 /") /3 

rLKI/z/3 


IDA Q 






1 .7 


1 


2 


B cell lymphoma 


DAI/ 

rAK 


vvuzuuyuyjy/z 


1 .5 


2.0 


1 .7 


1 


2 


soft tissue other 


PI3Kb 


Methotrexate 


1 .7 


4.2 


1 .5 


2 


8 


lymphoblastic leukemia,GNAS:wt 


Dihydrofolate reductase (DHFR) 


Roscovitine 


0.1 


3.1 


1 .5 


1 


2 


Burkitt lymphoma 


CDKs 


FTI-277 


1.4 


1.4 


1.5 


1 


2 


thyroid 


Farnesyl transferase (FNTA) 


PAC-1 


1.5 


1.5 


1.5 


1 


2 


Burkitt lymphoma 


CASP3 activator 


CCT018159 


1.3 


3.5 


1.4 


2 


14 


osteosarcoma,PTEN:0 


HSP90 


PF-4708671 


0.7 


0.8 


1.4 


2 


13 


Myeloma,BRCA1:wt 


p70 S6KA 


TO 37 


0.8 


1.1 


1.4 


2 


6 


MLH1:wt,APC:0 


BCL-2, BCL-XL 


MK-2206 


1.3 


1.4 


1.4 


1 


2 


endometrium 


AKT1/2 


JNK Inhibitor VIII 






0.3 


1 


2 


AML 


JNK 


Obatoclax Mesylate 






0.1 


2 


2 


RB1:-,CDK6:+ 


BCL-2, BCL-XL, MCL-1 



Drugs used for the treatment of at least one sample for maximum combination size c=3 and pharmacokinetic variations parameter o=1. For each drug we report 
the drug-name and the percentage of samples treated with that drug (%). In the cases where the drugs were also included in the c-2 and 3 catalogs, the 
percentage of samples treated are reported as well. For the maximum combination size c=3 we also report the number of assigned markers K s , the assigned drug- 
to-sample protocol fp the assigned markers, and the drug target. 



from clinical trails testing each anticancer drug as a sin- 
gle agent, where all patients enrolled are tested for a set 
of predefined biomarkers. Using this information we can 
estimate the overall response rate given a marker, for 
each of the markers considered. In second step, we 
should select a cohort of patients where the status of all 
these biomarkers has been determined. This cohort 
could be, in principle, the union of all cohorts where the 
drugs were tested as single agents. Using the mutation 
status of each gene and the estimated response rates 
given a marker we can estimate the response rate of 
each patient in an approximate manner. With those esti- 
mates at hand we can then apply the methodology intro- 
duced here and make a prediction for the optimal drug 
catalog, the assignment of optimal biomarkers to each 
drug and a treatment decision protocol where a drug is 
used to treat a patient when it is positive for the drug 
marker. Finally, the predicted personalized combinatorial 
therapy should be tested in a two arms clinical trial to 
determine how it performs compared to the standard of 
care. 

The optimization scheme introduced here can be gen- 
eralized in several directions. We can improve the re- 
sponse rate calculation including drug interactions, 
provided the direction and the magnitude of those inter- 
actions is given. Our approach is also suitable for the in- 
clusion of genetic markers affecting drug metabolism 
[2]. These markers can be included in the optimization 
scheme as well, provided we specify a model for their 



impact on the response rate. Further generalizations are 
also needed to model toxicity. However, these general- 
izations will result in more complicated models with 
more parameters, many of which will be difficult to 
quantify. In the mean time, the simplifications intro- 
duced here allow us to implement the personalized com- 
binatorial therapies approach in the clinical context, by 
routinely sequence a subset of genes on each patient en- 
rolled in clinical trials. 

Methods 

Simulated annealing algorithm 

The simulated-annealing algorithm aims to maximize the 
overall response rate, or equivalently to minimize E=-sO, 
where s is the number of samples. The algorithm starts 
from no markers assigned to drugs (ly=(0,...,0) for all 
drugs) and explores random changes of the Yj and the 
drug- to-sample protocols fi{X,Yj)< At each step of the algo- 
rithm, a drug ; is selected and, for that drug, either a 
marker is added or removed or a new drug-to-sample 
protocol is selected. Changes are accepted when E de- 
creases, and when E increases they are accepted with 
probability exp((3(£ 0 -^))> where E 0 and E are calculated be- 
fore and after the change, respectively, and |3 is the 
"annealing" parameter. (3 is gradually increased such that, 
as the algorithm proceeds, changes increasing E are more 
likely to be rejected. The pseudocode for the simulated 
annealing algorithm implementation for our specific 
optimization problem is shown in Figure 6. 
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>Set Yj=(0,...,0) for all drugs and £=0 

>Setj3=j3o 

>For t=l to T 

> Ifmod(>,dO = OTHEN 

> J3^>fr-dj3 

> END 

> Set E 0 =E 

> IF Y r (0, ... ,0) for all drugs THEN 

> Select rule add 

> OTHERWISE 

> Select a rule from {add, remove, change J) 

> END 

> IF rule=add THEN 

> Select a drugy with at most one marker 

> Select a marker with yjf=0 

> Set yjf=\ 

> Update fj using the mapping in Figure 2g 

> END 

> IF rule=remove THEN 

> Select a drugy with^pl for at least one marker / 

> Select a marker / with yjj= 1 

> Set #7=0 

> Update fj using the mappings in Figure 2e,f 

> END 

> IF rule=change _/THEN 

> Select a drugy with #7=1 for at least one marker / 

> Assign a new drug-to-sample protocol fj(X,YJ) 

> END 

> Determine the personalized therapies using the scheme in Fig. 1 

> Compute O and E=-sO 

> IF rule=add and drugy is not selected for the treatment of any sample* 

> Reject changes 

> ELSE 

> IF E<E 0 THEN 

> Accept changes 

> ELSE 

> With probability exp [J3(E 0 -E)] 

> Accept changes 

> ELSE 

> Reject changes 

> END 

> END 

> END 
>END 

Figure 6 Pseudocode for the simulated-annealing algorithm. 

*This step is introduced to avoid the accumulation of markers in 
drugs that are not used for treatment. 



In our simulations we have chosen the parameters 
r=10,000rf, dt=d, p 0 =0 and d|3=0.01. 

The simulated-annealing algorithm can get trapped in 
drug marker assignments that are suboptimal. To overcome 
this limitation we repeat the algorithm several times and 
report the solution with minimum c. We did not observe 



significant changes from a 100 to a 1,000 repetitions. The 
results discussed below are obtained for 1,000 repetitions. 

IC50 imputation 

The missing logIC50 data was imputed using the 
weighted average over samples with available data 

5 

logIC50^e"^ /e ^^ 
lo IC50 Imputed — k=1 ^ logIC50k ^ NA 



& sample^ ikj 



Ar=l,£**|logIC50^Ak4 



(7) 



where NA denotes missing value, 



dikj '- 



(logIC50,/-logIC50^) 2 e-^^ 

1= 1 1 /*/|logIC50 l7 *NA |logIC50 w *NA , 



E 

/= 1 1 Uj | logIC50«/ *NA | logIC50« *NA , 



(8) 



is a weighted distance between samples, and 

5 

(loglCSO^-loglCSO,/) 2 

i= 1 1 logIC50, y *NA | logIC50 l7 *NA , 
d )l = i 

E 

i=l | \oglC5Qij*NA | logIC50i/*NA, 

(9) 

is the Euclidean distance between drugs based on the 
available data. The exploration of the parameters 
(.^sample) ocdmg) in the range (1-22,1-22) resulted in Pearson 
Correlation Coefficients (PCCs) between imputed and 
actual logIC50s, when available, in the range 0.83-0.89, 
with a maximum of 0.89 for {a samp i e = 20, a drug = 3). 
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